Fix realtime ingestion when an entire batch of messages is filtered out #7927

richardstartin · 2021-12-18T12:42:41Z

eb6800da96a44e6f5125097cc99a368c2f8f8847 creates conditions where LLRealtimeSegmentDataManager cannot advance if a message batch consumed from Kafka is entirely filtered out, e.g. because it only contains tombstones. LLCRealtimeClusterIntegrationTest hangs on this commit.

01f9f16a18b188bd3e15408ad945eda9541caf75 allows offsets to be committed when the entire batch was filtered out. It also fixes some of the myriad warnings and concurrency bugs in LLRealtimeSegmentDataManager.

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java

richardstartin · 2021-12-18T12:46:52Z

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java

      } catch (PermanentConsumerException e) {
        _segmentLogger.warn("Permanent exception from stream when fetching messages, stopping consumption", e);
        throw e;
      } catch (Exception e) {
+        // all exceptions but PermanentConsumerException are handled the same way
+        // can be a TimeoutException or TransientConsumerException routinely


Collapsed identical catch blocks for the sake of the person reading the code.

richardstartin · 2021-12-18T12:47:16Z

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java

+      } else if (messageBatch.getUnfilteredMessageCount() > 0) {
+        // we consumed something from the stream but filtered all the content out,
+        // so we need to advance the offsets to avoid getting stuck
+        _currentOffset = messageBatch.getLastOffset();
+        lastUpdatedOffset = _streamPartitionMsgOffsetFactory.create(_currentOffset);


This is the bug fix, this ensures that we advance after consuming a bad batch.

richardstartin · 2021-12-18T12:47:47Z

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java

    }
    updateCurrentDocumentCountMetrics();
    if (streamMessageCount != 0) {
      _segmentLogger.debug("Indexed {} messages ({} messages read from stream) current offset {}", indexedMessageCount,
          streamMessageCount, _currentOffset);
-    } else {
+    } else if (messagesAndOffsets.getUnfilteredMessageCount() == 0) {


Prevents unnecessary latency when there has been a bad batch, there is probably data waiting to be consumed.

richardstartin · 2021-12-18T12:48:32Z

...ests/src/test/java/org/apache/pinot/integration/tests/LLCRealtimeClusterIntegrationTest.java

@@ -63,6 +63,11 @@
  private final boolean _enableLeadControllerResource = RANDOM.nextBoolean();
  private final long _startTime = System.currentTimeMillis();

+  @Override
+  protected boolean injectTombstones() {
+    return true;


Set this to false to make the test pass at eb6800da96a44e6f5125097cc99a368c2f8f8847

richardstartin · 2021-12-18T12:50:34Z

...ka-2.0/src/main/java/org/apache/pinot/plugin/stream/kafka20/KafkaPartitionLevelConsumer.java

-      throws IOException {
-    super.close();
+    List<ConsumerRecord<String, Bytes>> messageAndOffsets = consumerRecords.records(_topicPartition);
+    List<MessageAndOffset> filtered = new ArrayList<>(messageAndOffsets.size());


Note that a list was being materialised anyway in KafkaMessageBatch, it's just easier to do it here because we can also capture the last offset. This is likely more efficient than using Iterables.filter anyway.

pinot-spi/src/main/java/org/apache/pinot/spi/stream/MessageBatch.java

richardstartin · 2021-12-18T12:52:22Z

pinot-spi/src/main/java/org/apache/pinot/spi/stream/StreamPartitionMsgOffset.java

@@ -40,7 +40,7 @@
 * versions of the stream implementation
 */
 @InterfaceStability.Evolving
-public interface StreamPartitionMsgOffset extends Comparable {
+public interface StreamPartitionMsgOffset extends Comparable<StreamPartitionMsgOffset> {


This caused a lot of warnings which make code harder to read, and helps the compiler to prevent heap pollution bugs.

richardstartin · 2021-12-18T12:52:38Z

pinot-spi/src/main/java/org/apache/pinot/spi/stream/MessageBatch.java

+   * @return number of messages returned from the stream
+   */
+  default int getUnfilteredMessageCount() {
+    return getMessageCount();


Implemented for Kafka 2.0 only.

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java

codecov-commenter · 2021-12-18T14:00:01Z

Codecov Report

Merging #7927 (29e5d20) into master (58e7f10) will increase coverage by 0.03%.
The diff coverage is 84.00%.

@@             Coverage Diff              @@
##             master    #7927      +/-   ##
============================================
+ Coverage     71.08%   71.11%   +0.03%     
- Complexity     4109     4116       +7     
============================================
  Files          1593     1593              
  Lines         82373    82379       +6     
  Branches      12269    12274       +5     
============================================
+ Hits          58555    58586      +31     
+ Misses        19867    19852      -15     
+ Partials       3951     3941      -10

Flag	Coverage Δ
integration1	`29.01% <84.00%> (+0.01%)`	⬆️
integration2	`27.64% <60.00%> (+0.07%)`	⬆️
unittests1	`68.11% <0.00%> (+0.05%)`	⬆️
unittests2	`14.33% <56.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...in/stream/kinesis/KinesisPartitionGroupOffset.java	`17.39% <ø> (ø)`
...ava/org/apache/pinot/spi/stream/LongMsgOffset.java	`92.30% <ø> (ø)`
...java/org/apache/pinot/spi/stream/MessageBatch.java	`0.00% <0.00%> (ø)`
...in/stream/kafka20/KafkaPartitionLevelConsumer.java	`85.00% <84.61%> (-1.67%)`	⬇️
...manager/realtime/LLRealtimeSegmentDataManager.java	`73.35% <100.00%> (+1.53%)`	⬆️
...pinot/plugin/stream/kafka20/KafkaMessageBatch.java	`92.30% <100.00%> (+0.64%)`	⬆️
...ller/helix/core/minion/TaskTypeMetricsUpdater.java	`80.00% <0.00%> (-20.00%)`	⬇️
...nction/DistinctCountBitmapAggregationFunction.java	`47.66% <0.00%> (-9.85%)`	⬇️
.../org/apache/pinot/core/startree/StarTreeUtils.java	`69.89% <0.00%> (-2.16%)`	⬇️
...e/pinot/segment/local/io/util/PinotDataBitSet.java	`95.62% <0.00%> (-1.46%)`	⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58e7f10...29e5d20. Read the comment docs.

Jackie-Jiang

Good catch! Thanks for adding the test

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java

Jackie-Jiang

LGTM

sajjad-moradi

LGTM, with a few minor suggestions.

.../pinot-kafka-2.0/src/main/java/org/apache/pinot/plugin/stream/kafka20/KafkaMessageBatch.java

pinot-spi/src/main/java/org/apache/pinot/spi/stream/MessageBatch.java

...ka-2.0/src/main/java/org/apache/pinot/plugin/stream/kafka20/KafkaPartitionLevelConsumer.java

sajjad-moradi · 2021-12-21T03:27:33Z

On a separate note, I saw that originally there were some optimizations piggy-backed to this PR. We should avoid doing that. Each PR should only focus on one feature, one bug fix, etc. Any optimization or refactoring should go in a separate PR. That might take a little longer for the author, but it benefits all of us in the long run.
One example of this issue happened last week when we spent a long time finding the root cause for the chunk index writer bug. The problematic optimization was added in a PR with title Add MV raw forward index and MV BYTES data type. The table having performance problem didn't have any multi-value columns or byte data type. So we couldn't easily tell if this commit is related or not. And there were a lot of commits we needed to examine. If we had a separate PR for that optimization, we could've found the root cause much easier!

richardstartin · 2021-12-21T05:06:44Z

On a separate note, I saw that originally there were some optimizations piggy-backed to this PR. We should avoid doing that. Each PR should only focus on one feature, one bug fix, etc. Any optimization or refactoring should go in a separate PR. That might take a little longer for the author, but it benefits all of us in the long run.

There were no optimisations in this PR, there were some fixes for concurrency bugs (e.g. it's incorrect to use an increment operator on a volatile variable) so I'm not sure what you're referring to.

One example of this issue happened last week when we spent a long time finding the root cause for the chunk index writer bug. The problematic optimization was added in a PR with title Add MV raw forward index and MV BYTES data type.

I'm not sure this is the best place to be discussing this but you are implying that the change you mentioned was unrelated to the PR it was made in, but it wasn't. The purpose of the change, as has been discussed, is that the particular class creates very large buffers. It was included in that PR because its use for MV BYTES columns exacerbates this by multiplying what is already an overestimate of the buffer size by the maximum number of elements in the column. So the change was a mitigation to the worst case made worse by that PR.

The table having performance problem didn't have any multi-value columns or byte data type. So we couldn't easily tell if this commit is related or not. And there were a lot of commits we needed to examine. If we had a separate PR for that optimization, we could've found the root cause much easier!

Looking at commits to figure out what changed isn't an efficient diagnostic technique. Had you instead used a profiler you would have found the process was spending a large amount of time in the syscalls mmap and munmap; looking at the git blame for each Pinot stack frame above those syscalls would have found the culprit in O(stack depth) time rather than O(lines of code changed).

…filtered out

sajjad-moradi · 2021-12-22T18:44:03Z

@richardstartin I don't want to get to details here. My point was that one PR should focus on one thing, not more!

richardstartin commented Dec 18, 2021

View reviewed changes

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java Outdated Show resolved Hide resolved

richardstartin commented Dec 18, 2021

View reviewed changes

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java Outdated Show resolved Hide resolved

richardstartin commented Dec 18, 2021

View reviewed changes

pinot-spi/src/main/java/org/apache/pinot/spi/stream/MessageBatch.java Show resolved Hide resolved

richardstartin commented Dec 18, 2021

View reviewed changes

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java Outdated Show resolved Hide resolved

richardstartin force-pushed the rt-ingestion-stuck branch from 0430bc6 to d9c590e Compare December 18, 2021 13:03

Jackie-Jiang reviewed Dec 20, 2021

View reviewed changes

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java Outdated Show resolved Hide resolved

richardstartin force-pushed the rt-ingestion-stuck branch 3 times, most recently from 3e3184b to 5214304 Compare December 20, 2021 21:16

Jackie-Jiang approved these changes Dec 21, 2021

View reviewed changes

sajjad-moradi approved these changes Dec 21, 2021

View reviewed changes

richardstartin added 6 commits December 21, 2021 07:33

inject tombstones to break RT ingestion

da2b99d

prevent ingestion from breaking when an entire batch of messsages is …

2d089e9

…filtered out

linter appeasement

97e5f15

fix off by one bug for bad batches

9c73b6d

ignore concurrency warnings

08394b2

javadoc for KafkaMessageBatch

29e5d20

richardstartin force-pushed the rt-ingestion-stuck branch from 104a3bf to 29e5d20 Compare December 21, 2021 07:33

remove redundant null check

452bbbe

Jackie-Jiang merged commit f8c7e1f into apache:master Dec 21, 2021

richardstartin mentioned this pull request Dec 21, 2021

Fix the typo in MessageBatch method #7942

Merged

richardstartin mentioned this pull request Feb 17, 2022

consumption halted on realtime table when accessing an offset that has been already deleted from Kafka #8219

Open

ddcprg mentioned this pull request Jul 22, 2022

LL Realtime ingestion may drop messages if the segment time threshold is crossed #9091

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix realtime ingestion when an entire batch of messages is filtered out #7927

Fix realtime ingestion when an entire batch of messages is filtered out #7927

richardstartin commented Dec 18, 2021

richardstartin Dec 18, 2021

richardstartin Dec 18, 2021

richardstartin Dec 18, 2021

richardstartin Dec 18, 2021

richardstartin Dec 18, 2021

richardstartin Dec 18, 2021

richardstartin Dec 18, 2021

codecov-commenter commented Dec 18, 2021 •

edited

Loading

Jackie-Jiang left a comment

Jackie-Jiang left a comment

sajjad-moradi left a comment

sajjad-moradi commented Dec 21, 2021

richardstartin commented Dec 21, 2021 •

edited

Loading

sajjad-moradi commented Dec 22, 2021

Fix realtime ingestion when an entire batch of messages is filtered out #7927

Fix realtime ingestion when an entire batch of messages is filtered out #7927

Conversation

richardstartin commented Dec 18, 2021

richardstartin Dec 18, 2021

Choose a reason for hiding this comment

richardstartin Dec 18, 2021

Choose a reason for hiding this comment

richardstartin Dec 18, 2021

Choose a reason for hiding this comment

richardstartin Dec 18, 2021

Choose a reason for hiding this comment

richardstartin Dec 18, 2021

Choose a reason for hiding this comment

richardstartin Dec 18, 2021

Choose a reason for hiding this comment

richardstartin Dec 18, 2021

Choose a reason for hiding this comment

codecov-commenter commented Dec 18, 2021 • edited Loading

Codecov Report

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Jackie-Jiang left a comment

Choose a reason for hiding this comment

sajjad-moradi left a comment

Choose a reason for hiding this comment

sajjad-moradi commented Dec 21, 2021

richardstartin commented Dec 21, 2021 • edited Loading

sajjad-moradi commented Dec 22, 2021

codecov-commenter commented Dec 18, 2021 •

edited

Loading

richardstartin commented Dec 21, 2021 •

edited

Loading